Karma2: Provenance Management for Data-Driven Workflows
نویسندگان
چکیده
The increasing ability for the sciences to sense the world around us is resulting in a growing need for data driven applications that are under the control of workflows composed of services on the Grid. The focus of our work is on provenance collection for these workflows, necessary to validate the workflow and to determine quality of generated data products. The challenge we address is to record uniform and usable provenance metadata that meets the domain needs while minimizing the modification burden on the service authors and the performance overhead on the workflow engine and the services. The framework is based on generating discrete provenance activities during the lifecycle of a workflow execution that can be aggregated to form complex data and process provenance graphs that can span across workflows. The implementation uses a loosely-coupled publish-subscribe architecture for propagating these activities and the capabilities of the system satisfies the needs of detailed provenance collection. A performance evaluation of a prototype finds a minimal performance overhead (in the range of 1% for an eight service workflow using 271 data products).
منابع مشابه
A Provenance-Integration Framework for Distributed Workflows in Grid Environments
Provenance information about complex and distributed workflows is a key issue for data quality control and data reliability maintenance in reservoir management. Distributed and integrated environments where different workflows consume and transform data require a comprehensive provenance view. In this scenario provenance collection and integration presents significant challenges. In this paper,...
متن کاملA Data Model for Analyzing User Collaborations in Workflow-Driven e-Science
Scientific discoveries are often the result of methodical execution of many interrelated scientific workflows, where workflows and datasets published by one set of users can be used by other users to perform subsequent analyses, leading to implicit or explicit collaboration. In this paper, we describe a data model for “collaborative provenance” that extends common workflow provenance models by ...
متن کاملApplication of Provenance for Automated and Research Driven Workflows
Provenance has recently become a popular topic for workflow execution environments but it is also relevant to other applications, such as long-running, user-driven "research workflows", problem solving environments, and data streaming (data analysis) environments. This paper presents a number of use cases where provenance can play an important role in understanding how data was derived, how dec...
متن کاملPerformance Evaluation of the Karma Provenance Framework for Scientific Workflows
Provenance about workflow executions and data derivations in scientific applications help estimate data quality, track resources, and validate in silico experiments. The Karma provenance framework provides a means to collect workflow, process, and data provenance from data-driven scientific workflows and is used in the Linked Environments for Atmospheric Discovery (LEAD) project. This paper pre...
متن کاملTackling the Provenance Challenge one layer at a time
VisTrails is a new workflow and provenance management system that provides support for scientific data exploration and visualization. Whereas workflows have been traditionally used to automate repetitive tasks, for applications that are exploratory in nature, change is the norm. VisTrails uses a new change-based provenance mechanism which was designed to handle rapidly-evolving workflows. It un...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Web Service Res.
دوره 5 شماره
صفحات -
تاریخ انتشار 2008